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A.  STATEMENT  OF  THE  PROBLEM 


The  Advanced  Technology  Laboratories  (ATL)  of  RCA  has 
undertaken  the  task  of  developing  sophisticated  algorithms 
for  autonomous  target  acquisition  and  tracking.  When  this 
contract  was  awarded,  we  had  already  demonstrated  in  a 
simulation  testbed  that  a  statistical  segmentation  algorithm 
using  multiple  features  for  classification  performs  well  on 
low  resolution  infrared  images  of  tanks.  This  work  followed 
and  extended  the  work  of  Or.  Gerald  Flachs,  Dr.  Alton 
Gilbert,  and  others,  working  at  White  Sands  Missile  Range 
with  Army  Research  Office  support.  We  proposed  with  this 
contract  to  investigate  improvements  to  the  tracker  system, 
and  evolve  towards  a  real-time  hardware  impl erne n tat i on . 

B.  SUMMARY  OF  IMPORTANT  RESULTS 

The  goals  of  the  original  contract  were  met  within  two 
years  of  the  contract  award.  At  that  time,  several  analytical 
and  experimental  investigations  had  been  conducted.  The 
tracker  simulator  had  gone  through  several  revisions,  as  new 
concepts  were  invented  and  tested.  A  report  describing  many 
of  the  facets  of  the  statistical  tracking  system  that  were 
worked  out  by  this  contract  is  attached  as  an  appendix. 

The  major  accomplishments  that  occurred  during  the  course 
of  the  contract  were: 

1.  An  adaptive  gate  process  was  implemented  interior  to 
the  tracking  window.  The  technique  effectively  controls 
the  size  of  the  tracking  window. 

2.  The  idea  of  intelligently  defining  varying  weights  for 
the  misclassification  costs  C ( T/B )  and  C(B/T)  within 


the  tracking  window  was  implemented  and  shown  to 
improve  tracking  in  multiple  target  situations.  The 
technique  was  later  dropped  to  allow  for  a  new  method 
of  controlling  the  decision  threshold. 

Initial  simulations  revealed  that  a  fixed  threshold 
when  computing  the  decision  rule  that  segments  out  the 
target  from  the  background  was  inadaquate.  We  observed 
that  any  fixed  threshold  would  at  some  times  grossly 
undersegment  the  target  and  at  other  times  grossly 
oversegment  the  target.  Given  that  there  is  a  need  for 
varying  the  decision  threshold,  two  separate  issues 
arise:  what  criteria  to  use  to  control  the  threshold, 
and  how  to  control  the  threshold  to  satisfy  the 
criteria.  One  of  the  accomplishments  of  the  contract 
was  the  development  of  a  satisfactory  solution  to  both 
issues. 

The  criteria  by-which  the  threshold  is  now  controlled 
is  essentially  Neyman-Pear son ,  the  probability  of 
classifying  a  target  pixel  target  (detection 
probability)  is  maximized  under  the  constraint  that  the 
probability  of  mi sclassi fying  a  background  pixel  (false 
alarm  rate)  is  held  to  a  fixed  level.  Additional 
constraints  were  added  that  serve  to  guarantee  that  at 
least  some  pixels  will  be  labeled  target. 

A  computational  procedure  was  developed  that  allows  the 
threshold  to  be  computed  efficiently  from  the  target 
and  background  histogram  data.  This  development  will 
allow  the  technique  to  be  easily  implemented  in 
real-time. 

A  study  was  conducted  of  performance  measures  that 
might  be  used  to  sense  and  hopefully  predict 
breacklock.  Many  potential  performance  measures  were 


defined  and  tested  with  both  Gaussian  and  real  image 
sequences.  Several  potentially  useful  statistical 
segmentation  measures  were  found.  The  results  of  this 
investigation  were  then  applied  to  the  problem  of 
assigning  the  number  of  bits  to  features. 

A  coast  mode  was  developed  and  implemented  in  the 
Simula  tor  which  significantly  extended  the  tracking 
situations  that  can  be  successfully  handled.  The  coast 
mode  consists  of  three  additions  to  the  tracker 
simulator.  First,  a  Kalman  filter  that  maintains  a 
running  estimate  of  the  target  trajectory.  Second,  a 
performance  monitor  that  is  used  to  detect  breaklock 
conditions.  Last,  a  reacquistion  strategy,  that  searchs 
for  a  target  based  on  size,  statistical  signature,  and 
trajectory . 

In  preparation  for  a  proposal  for  a  real-time, 
fieldable  implementation  of  the  tracker  system,  the 
simulator  was  used  to  finalize  the  version  of  the 
tracker  to  be  proposed.  At  this  time,  this  real-time 
system  is  being  designed  under  contract  with  the  U.  S. 
Army  Missile  Command  (MICOM).  As  a  result  of  our 
efforts  with  the  simulator,  we  were  able  to  include 
many  of  the  concepts  described  above  (particularly 
breacklock  detection  and  reacquisition)  into  the 
fieldable  system. 

An  investigation  was  conducted  on  image  representation 
and  matching  for  motion  analysis,  which  will  have 
application  to  tracking  and  autonomous  navigation.  When 
applied  to  navigation,  the  motion  occurs  as  the  result 
of  a  moving  sensor  observing  a  stationary  scene.  The 
desired  objective  is  to  detect  the  mot ion  of  certain 
points  in  the  scene,  and  analyze  this  motion  to  provide 
ranging  or  depth  in  formation  and  coarse  object 


segmentation.  The  formulated  approach  is  to  employ 
multiple  processes  on  the  images  as  follows:  Laplacian 
pyramid,  contour  extraction  and  representation, 
structural  matching,  displacement  field  analysis,  and 
object  segmentation.  Some  preliminary  results  were 
obtained  in  the  development  of  the  contour  extraction 
and  representation  process. 
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1.0  INTRODUCTION 


This  paper  will  describe  the  video  tracking  system 
developed  by  the  Advanced  Technology  Laboratories  oi  RCA 
Corporation  on  IRAD  and  under  Army  Research  Office  contract.  A 
major  theme  is  to  explore  the  tradeoffs  and  interplay  between 
the  three  major  influences  in  the  development  of  the  tracker 
algorithm:  theoretical  pattern  recognition,  human  intuition, 
and  finally  the  constraint  to  be  able  to  implement  the  tracker 
in  real-time,  that  is,  the  tracker  must  be  able  to  accept  a 
standard  video  signal  and  process  every  video  frame.  The  impact 
of  this  real  time  constraint  will  be  highlighted  throughout 
this  paper. 

The  development  of  the  tracker  algorithm  took  place  from 
1981  to  1983  using  a  tracker  system  software  simulator  written 
in  FORTRAN.  The  video,  inputs  to  the  simulator  were  obtained  by- 
digitizing  interesting  sequences  of  video  containing  targets 
such  as  tanks,  trucks,  and  airplanes.  The  video  was  generated 
by  several  types  of  sensors,  both  of  the  infrared  and  visible 
band.  One  of  the  important  capabilities  of  this  tracker  is  its 
ability  to  work  with  a  variety  of  targets  and  sensors.  The 
reason  it  can  be  flexible  lies  in  the  generality  of  the 
segmentation  process,  which  does  not  rely  on  any  a  prior 
knowledge  of  the  sensor  or  target  type. 

RCA  has  a  contract  with  the  U.S.  Army  Missile  Command 
(MICOM)  to  build  the  real-time  tracker  system  that  so  far  has 
only  been  simulated. 


2.0  OVERVIEW  OF  TRACKER  ALGORITHM 


A  functional  block  diagram  of  the  statistical  tracker  is 
shown  in  Figure  1.  The  major  subsections  of  the  diagram  are  the 
front  end  feature  generator  (video  preprocessing,  median 
filter,  feature  computation,  scaling),  the  segmentation 
processing  (histogram  processing,  decision  rule  computation, 
segmenter),  high  level  control  (centroid,  adaptive  gate  window 
control,  Kalman  filter,  mode  control),  and  search  processor. 

The  heart  of  the  tracker  is  the  statistical  segmenter, 
which  examines  the  image  data  and  identifies  those  pixels  that 
are  target.  The  basic  problem  solved  by  the  statistical 
segmenter  is  illustrated  in  Figure  2.  Within  the  track  window 
each  pixel  can  be  either  target  or  background.  Target  pixels 
have  one  distribution  of  feature  vectors  (Figure  2  illustrates 
a  single  intensity  feature)  while  background  pixels  most  likely 
have  some  other  distribution  of  feature  vectors.  Assuming  the 
segmenter  can  learn  what  these  distributions  are,  the  problem 
is  to  use  them  to  divide  up  the  feature  space  into  target  and 
background  regions.  The  statistical  segmenter  uses  a  likelihood 
ratio  test  to  generate  the  target/background  decisions.  The 
complete  process  is  described  in  a  later  section.  In  order  to 
maintain  good  estimates  of  the  feature  distributions,  the 
tracker  must  constantly  update  the  window  size  and  position  in 
order  to  keep  the  target  enclosed.  The  tracker  incorporates  a 
Kalman  filter  to  estimate  the  -velocity  of  the  target.  The 
velocity  estimate  modifies  the  position  estimates  from  the  last 
frame  to  establish  a  prediction  window  for  the  target  location. 
This  velocity  prediction  is  especially  important  for  targets 
with  high  line  of  site  rates  and  in  the  reacquisition  mode 
after  breaklock  has  occurred. 


To  further  increase  the  robustness  of  the  tracker,  a 
breaklock  detection  and  target  reacquisition  capability  has 
been  developed.  Breaklock  is  detected  by  examining  the  quality 
of  the  segmentation  over  a  succession  of  frames.  The  target  is 
reacquired  by  searching  the  binary  target  image  for  a  target  of 
appropriate  size  by  correlation  with  a  binary  mask. 


3.0  SUBSYSTEM  DESCRIPTIONS 


3.1  FEATURE  GENERATION 

The  steps  to  generation  of  features  are  front  end  video 
preprocessing  to  digitize  the  video,  median  filtering  that  can 
be  optionally  bypassed,  feature  computation,  and  scaling/ 
combi ni ng . 

The  preprocessing  consists  of  global  automatic  gain 
control,  level  correction  and  analog  to  digital  conversion  of 
the  input  video.  The  output  is  a  two  dimensional  image, 
image(i,j),  60  times  per  second. 

The  median  filter  computes  the  3  by  3  median  of  medians. 
Let  the  output  of  the  median  filter  be  denoted  by  pimage(i,j). 
To  compute  the  value'of  the  median  at  position  i,j,  we  first 
compute  the  median  (middle  valued  sample)  of  each  of  the  three 
sets  of  intensities: 

image(i-l,j-l),  image ( i-1 ,  j ) ,  image(i-l,j  +  l) 
image  ( i ,  j-1 ) ,  imaged, j),  imaged, j+1) 
image ( i  +  1 , j-1 ) ,  image ( i  +  1 , j ) ,  image ( i  +  1 , j  +  1 ) 


We  then  compute  the  median  of  these  three  median  values. 
The  primary  benefit  of  median  filtering  is  for  noise  reduction 
when  tracking  targets  in  noisy  video. 

There  are  a  number  of  features  that  can  be  used  in  the 
tracker,  and  we  have  run  tests  using  various  combinations.  We 
have  found  that  two  features,  intensity  and  edge  magnitude,  are 
general ly  optimal . 


The  edge  feature  is  an  approximation  to  the  Sobel  edge 
magnitude.  The  calculation  is  shown  in  Figure  3.  The  intensity 
feature  is  the  sampled,  filtered  video  intensity.  Scaling  of 
features  is  necessary  because  the  segmenter  uses,  at  most,  a 
total  of  8  bits  from  both  features  combined.  Typically,  we  use 
4  bits  of  intensity  and  4  bits  of  edge,  but  any  combination 
that  uses  no  more  than  8  bits  total  will  be  operator 
selectable.  The  limitation  to  eight  bits  is  partly 
implementation  driven,  but  also  driven  by  the  need  to  be  able 
to  estimate  joint  probability  density  functions  of  the 
features.  If  too  many  bits  are  allowed  then  the  amount  of  data 
needed  to  make  the  estimates  accurately  becomes  prohibitive.  We 
have  typically  used  a  total  of  7  or  8  bits  in  the  simulations, 
which  has  worked  well  even  for  small  targets  (which  generate 
less  data  to  estimate  the  probability  densities). 

The  scaler  scales  the  feature  samples  by  any  power  of  two 
(saturating  the  number  if  it  exceeds  the  range  of  legal 
numbers),  and  then  selects  the  specified  number  of  bits  from 
the  the  most  significant  end  of  the  number. 

To  aid  the  processor  in  setting  the  power  of  two  scale 
factor,  the  scaler  measures  the  peak  value  of  each  feature 
within  the  track  window  every  frame. 

3.2  SEGMENTATION 

The  tracker  uses  a  target  segmentation  technique  based  on 
optimum  decision  theory.  It  is  a  widely  applicable  technique  in 
that  the  decision  rule  is  derived  totally  from  the  video  with 
the  only  assumption  being  that  the  target  is  initially 
designated  to  the  tracker.  No  assumption  is  implicit  that  the 
target  is  the  brightest  (or  hottest)  object  in  the  field  of 
view.  The  tracker  can,  of  course,  track  such  targets,  because 
it  quickly  learns  that  the  target  is  the  brightest  object. 


The  derivation  of  the  segmentation  decision  rule  vi  1 1  be 
described  in  several  steps.  First,  the  hypothesis  test  is 
stated.  Several  solutions  to  the  hypothesis  test  are  possible, 
depending  on  what  criteria  is  used  to  make  the  decisions.  The 
criteria  selected  for  this  tracker  is  similar  to  the 
Neyman-Pear son  criteria,  so  it  is  described  first.  Then  the 
modified  segmentation  criteria  used  by  the  tracker  to  derive 
the  decision  rule  is  described.  Finally,  the  methods  by  which 
the  theory  is  actually  applied  in  the  tracker  are  discussed. 

Hypothesis  Test  -  The  formulation  of  the  sementation 
process  begins  with  a  description  of  the  segmenters  job  in  the 
form  of  a  hypothesis  test:  The  segmenter  is  given  samples  of 
the  feature  vector,  X,  at  each  pixel  position.  The  segmenter 
must  classify  the  pixel  as  either  target  or  background.  One  of 
the  two  choices  must  be  made.  The  assumption  is  now  made  that 
the  conditional  probability  densities  are  known: 

p(XHT)  -  Probability  of  observing  X,  given  that  the 
underlying  pixel  is  target. 

p(XHB)  -  Probability  of  observing  X,  given  that  the 
underlying  pixel  is  background. 

These  densities  will  be  estimated  by  the  segmenter  with 
hi stograms . 

Decision  Criteria  -  Several  criteria  for  making  the 
decision  are  possible.  In  the  past,  we  have  used  a  Bayes  risk 
criteria  to  making  the  decision.  With  this  approach,  one 
assigns  costs  to  the  classification  mistakes  (decide  target 
when  actually  the  pixel  is  part  of  the  background,  or  vice 
versa).  More  recently,  we  have  settled  on  a  criteria  that  is 
similar  to  the  Neyman-Pear son  criteria.  The  concern  is  with  the 
probability  of  false  alarm,  Pf,  which  is  the  probability  of 
deciding  target  when  the  underlying  pixel  is  background,  and 


the  probability  of  detection,  Pd,  which  is  the  probability  of 
deciding  target  when  the  underlying  pixel  actually  is  target. 
With  the  Neyman-Pear son  criteria,  one  constrains  Pf#<=#k,  and 
designs  a  test  to  maximize  Pd,  while  satisfying  the  constraint. 

The  optimum  solution  to  the  Neyman-Pear son  hypothesis  test 
is  based  on  the  likelihood  ratio  (for  derivation  see  Van  Trees, 
Detection,  Estimation,  and  Modulation  Theory,  Part  I,  pg. 
33-34. ) : 

Target 
p ( XHT )  > 

-  Lambda 

p(X*B)  < 

Background 

The  threshold  Lambda  is  set  so  that  Pf=k. 

One  complication  ’occurs  with  the  whole  derivation,  due  to 
the  fact  that  what  is  actually  estimated  in  the  segmenter  is 
not  p(XKT)  and  p(XKB),  but  instead  the  distributions  within  the 
window  and  frame  regions:  p(X1W)  and  p(XHF).  In  order  to  apply 
the  above  optimum  test,  the  following  assumption  is  made:  That 
the  target  lies  completely  within  the  window  region,  leaving 
only  background  in  the  frame.  With  this  assumption,  the 
distributions  within  the  window  can  be  expressed  in  terms  of 
the  target  and  background  distributions,  as  follows: 

p(XHW)  =  A*p (XHB  )  +  ( 1 -A ) *p ( Xf T ) 
p(XUF)  =  p(XlB) 

These  equations  say  that  the  window  distribution  is  given 
by  a  weighted  sum  of  the  target  and  background  distributions, 
while  the  frame  distribution  is  equal  to  the  background 
distribution,  because  by  assumption,  no  target  is  allowed  in 
the  frame. 


The  pair  of  equations  can  be  solved  for  the  window  and 
frame  distributions,  which  can  then  be  substituted  into  the 
hypothesis  test  to  give: 


Target 
p(X«W)  > 


Lambda ' 


p(XIB)  < 

Background 

where  Lambda'  =  Lambda*( 1-A )+A .  The  constant  A  is  the  fraction 
of  the  window  that  is  background.  The  value  of  A  is  not  known 
precisely,  but  it  is  not  needed  anyway,  because  the  technique 
which  is  used  to  set  the  threshold,  directly  sets  Lambda'. 
Henceforth,  the  '  will  be  dropped  and  Lambda'  will  simply  be 
called  Lambda. 

The  above  discussion  has  placed  on  firm  ground  the  fact 
that  the  tracker  udes  window  and  frame  distributions  to 
generate  the  decision  rule.  Because  the  segmenter  uses  a 
likelihood  ratio  test,  for  a  given  false  alarm  rate,  the 
probability  of  detection  is  maximized.  Now  the  method  by  which 
the  threshold,  Lambda,  is  set  will  be  described. 


As  Lambda  is  decreased,  the  resulting  decision  rule  will 
classify  an  increasing  number  of  features  as  target.  To  set 
Lambda,  we  apply  two  rules.  First,  we  require  that  at  least 
some  minimum  fraction  of  pixels  within  the  window  be  classified 
target.  So  we  automatically  lower  Lambda  until  that  fraction  is 
reached.  Second,  we  lower  the  value  of  Lambda  further  until 
some  maximum  fraction  of  pixels  in  the  frame  region  are 
classified  target.  This  second  rule  maximizes  the  fraction  of 
pixels  in  the  window  region  that  will  be  classified  target. 


The  effect  of  the  first  rule  is  that  even  in  situations 
which  are  very  difficult  to  segment,  such  as  high  clutter 
situations,  the  segmenter  will  at  least  classify  some  minimum 
amount  of  the  window  region  as  target.  The  effect  of  the  second 
rule  is  that,  in  easy  to  segment  situations,  the  segmenter  will 
be  able  to  pull  out  the  whole  target.  As  the  difficulty  of  the 
scene  increases  the  segmenter  will  back  off  and  classify  only 
the  most  likely  pixels  as  target.  In  simulations,  we  have 
settled  on  a  value  of  .15  for  the  minimum  fraction  of  target  in 
the  window.  Generally,  the  window  control  algorithm  will  drive 
the  window  size  to  an  area  such  that  the  target  fills  between 
.25  to  .30  of  the  window.  So  only  in  very  high  clutter 
situations  will  the  segmenter  be  forced  to  exceed  the  desired 
false  alarm  rate  in  the  frame  region.  The  maximum  fraction  of 
target  pixels  in  the  frame  region  has  been  set  to.  015  in  our 
simulations.  The  application  of  this  theoretical  discussion  of 
decision  theory  by  the  tracker  will  now  be  described. 

Every  frame,  the  segmenter  collects  histograms  of  the 
features  within  the  window  and  frame  regions.  The  window 
histogram  h(XHW)  indicates  the  number  of  times  the  feature 
combination  X  occurred  within  the  window  region,  and  the  frame 
histogram  h(XHF)  indicates  the  number  of  times  the  feature 
combination  X  occurred  within  the  frame  region.  These 
histograms  are  normalized  by  the  number  of  pixels  in  each  of 
the  regions  to  give  estimates  of  the  probability  distributions 
in  the  window  and  frame  regions.-  These  instantaneous  (single 
frame)  distributions  will  be  denoted  pi(XfW)  and  pi(XIF). 

In  order  to  reduce  short  term  fluctuations  and  improve  the 
quality  of  the  estimates,  the  instantaneous  distributions  are 
filtered  in  time  over  several  video  frames  to  obtain  smoothed 
distributions,  denoted  p  s { X  V  W )  and  ps(XHF).  The  filtering 
operation  is  given  by: 


p  s  (  X II W )  [new]  =  C  *  ps(XfH)  [ol  d]  +  (1-C)  *  pi(XH  W) 
ps ( XKF ) [ new]  =  C  *  p s ( XHF )  [ ol d]  +  (1-C)  *  pi  ( X U F ) 


The  constant  C  is  chosen  to  provide  10  to  15  frames  of 
averaging,  with  exponential  weighting. 

The  likelihood  ratio  is  computed  for  all  X: 

ps{X«W) 

L (X )  =  - 

ps(XHF) 


In  order  to  set  the  threshold,  we  need  to  be  able  to 
determine  for  any  value  of  Lambda,  how  many  pixels  will  be 
classified  target  in  both  the  window  and  frame  regions.  The 
functions  of  interest  will  be  denoted  N(LambdaHW)  and 
N ( LambdaUF ) .  They  are  defined  by: 

N(LambdaHW)  =  Sum  h(XHW)  over  all  X  satisfying  L ( X )  >=  Lambda 

N(LambdaHF)  =  Sum  h(XfF)  over  all  X  satisfying  L ( X )  >=  Lambda 


It  might  first  appear  that  some  sort  of  iterative  search 
for  the  threshold  is  necessary.  Such  an  approach  might  not  be 
feasible  considering  the  fact  that  this  computati  on  must  be 
performed  60  times/second  in  the  real-time  version  of  the 
tracker.  The  following  somewhat  novel  approach  was  conceived  to 
allow  the  computation  to  be  non-iterative,  with  the  sacrifice 
of  accuracy  in  the  final  value  of  Lambda,  because  Lambda  will 
be  forced  to  one  of  a  finite  set  of  values. 


We  can  compute  the  N(LambdaflW)  and  N ( LambdaflF )  functions  at 
discrete  values  of  Lambda  =  k*D  with  a  two  step  process.  First, 
we  compute  auxilliary  functions: 


dN(klW)  =  N(k*DUW)  -  N< (k+1 )*DHW) 
dN(kHF)  =  N ( k*DHF )  -  N ( ( k+1 ) *D1F ) 


These  functions  are  sort  of  like  a  histogram  in  the 
threshold.  They  determine  for  a  given  value  of  the  threshold, 
what  change  in  the  numbers  of  target  pixels  in  the  window  and 
frame  region  would  be  caused  by  decreasing  Lambda  by  the  amount 
D.  The  dN  functions  are  computed  with  the  following  algorithm: 

For  al  1  k  : 

dN(klW)  =  0 
dN(kHF)  =  0 

For  all  X : 
z  «  L  (  X  )  /  D 

k  =  Truncate  z  to  integer  and  in  the  range  [1..256] 

dN(klW)  =  dN  ( ktVI )  +  h(XlW) 
dN(klF)  =  dN(kHF)  +  h(XHF) 

The  functions  N(Lambda^W)  and  N(LambdaHF)  can  now  be 
evaluated  at  the  discrete  points  Lambda  =  k*D  by 

N(k*DUW)  =  Sum  dN(iUW)  over  all  i  from  k  to  256 
N(k*D«F)  =  Sum  dN(ilF)  over  all  i  from  k  to  256 

In  practice  these  two  functions  are  computed  starting  at 
the  maximum  k  (256)  and  then  decreasing  k  until  first  the 
number  of  target  pixels  in  the  window  region,  N(k*D1fW) ,  exceeds 
Pd  times  the  area  of  the  window.  Then  k  is  possibly  further 
decreased,  until  the  number  of  target  pixels  in  the  frame 
region,  N(k*DUF)  would  exceed  Pf  times  the  area  of  the  frame 
region.  The  final  value  of  k  is  then  used  to  compute  the  value 
of  Lambda  using  Lambda  =  k*D 


The  decision  rule  to  be  used  during  frame  i  is  computed 
during  frame  i-1  using  smoothed  statistics  up  to  frame  i-2  and 
using  the  instantaneous  histograms  from  frame  i-2  to  set  the 
threshold. 


The  decision  rule  is  stored  in  the  segmenter  in  the  form  of 
a  look  up  table  that,  for  each  feature  combination,  contains 
the  binary  decision.  Incoming  feature  combinations,  X,  are 
classified  by  looking  up  the  decision.  The  output  of  the 
segmenter  is  a  binary  image  consisting  of  the  one  or  zero 
decisions  for  each  pixel. 


3.3  TARGET  IMAGE  PROCESSING 

Target  image  processing  entails  many  processes.  The  target 
image  is  a  binary  image  that  represents  the  segmenters 
decisions  as  to  which  pixels  are  target.  The  target  image  is 
processed  to  obtain  the  target  position  using  projections,  and 
the  target  size  using  an  adaptive  gate  process.  These 
estimations,  which  are  made  every  frame,  are  processed  by  a 
high  level  controller  in  order  to  predict  the  size  and  position 
of  the  target  in  subsequent  frames.  The  window  is  placed  in 
each  frame  in  the  position  where  the  target  is  anticipated  to 
be  on  that  frame.  The  next  section  will  describe  the  additional 
processing  on  the  target  image  for  breaklock  detection  and 
reaqui sti on . 

For  the  computation  of  the  target  position,  first, 
projections  are  computed  within  the  target  window.  The  row 
projection  is  obtained  by  summing  the  number  of  target  pixels 
in  each  row  across  the  columns.  The  column  projection  is 
obtained  by  summing  the  number  of  target  pixels  in  each  column 


down  the  rows.  These  two  projection  functions  are  then 
processed  to  determine  the  502  points,  that  is  the  row  and 
column  which  split  the  area  under  the  projection  functions  into 
two  equal  halves. 

The  use  of  502  points  is  preferred  over  a  true  centroid.  A 
true  centroid  (center  of  gravity)  weights  points  that  are 
farther  from  the  center  more  heavily  than  points  that  are 
closer  in  to  the  center. 

The  position  of  the  target  is  processed  through  a  Kalman 
filter  of  order  two  (in  each  of  horizontal  and  vertical 
directions),  which  maintains  a  velocity  and  position  estimate 
of  the  target.  The  low  order  of  the  Kalman  filter  is  due 
partially  to  the  difficulty  of  modeling  the  target  motion  in 
the  sequences  that  we  have  been  using  in  the  simulator.  In  the 
sequences  we  use,  sensor  motion  is  also  responsible  for  motion 
of  the  target  in  the  scene.  At  some  later  point  in  time,  when 
we  actually  close  the  loop  and  the  tracker  drives  the  sensor 
pointing  angle,  we  will  have  to  revisit  the  issue  of  target 
modeling.  In  any  case  the  position  estimate  is  used  each  frame 
to  position  the  target  window  where  the  target  is  predicted  to 
be  according  to  the  Kalman  state  extrapolation.  Also,  the 
Kalman  state  vector  is  used  during  the  reacquistion  mode  to 
bias  the  search  for  the  target  to  where  the  target  would  most 
likely  be  located. 

The  dynamic  control  of  the  window  size  is  accomplished 
using  an  adaptive  gate  process  which  provides  a  means  of 
increasing  or  decreasing  the  size  of  the  window  as  the  target 
size  changes  due  to  target  motion  or  range  closure.  Edge  gates 
are  placed  in  positions  which  are  expected  to  lie  on  target 
boundaries.  Counts  of  target  pixels  are  taken  within  these 
gates.  The  size  of  the  window  is  controlled  by  a  servo  which 
drives  the  window  size  so  that  the  number  of  target  pixels 
balances  the  number  of  background  pixels  in  the  gates.  The 


imbalance  in  any  frame  is  passed  through  a  low  bandwidth 
digital  filter  to  generate  the  new  window  size.  The  filter 
includes  two  cascaded  integrators,  therefore  the  closed  loop  is 
a  type  two  servo.  It  can  respond  to  an  expanding  target  size 
with  zero  residual  error. 

3.4  BREAKLOCK  DETECTION  AND  REACQUISTION 

The  breaklock  detection  and  reacquistion  system  is  provided 
as  a  last  resort  to  prevent  loss  of  track.  In  simulations  of 
the  tracker  system,  targets  have  been  reacquired  after 
temporarily  disappearing  completely  behind  an  occlusion.  The 
system  should  be  able  to  handle  reacquistion  after  loss  of 
track  due  to  sensor  jitter  at  launch. 

Complete  execution  of  breaklock  detection  and  reacquisition 
occurs  in  three  phases,  corresponding  to  the  three  tracker 
modes:  normal  track,'  track  while  coast,  and  search.  For  the 
detection  of  breaklock,  a  segmentation  performance  measure  is 
monitored.  The  performance  measure  is  an  estimate  of  the 
probability  of  mi scl assi fyi ng  a  pixel  in  the  target  window.  It 
is  computed  by  counting  the  number  of  target  pixels  in  the 
window  and  frame  regions.  When  the  tracker  is  in  normal  track 
mode,  the  performance  is  monitored.  If  the  performance  measure 
falls  below  a  certain  threshold  in  any  given  frame,  then  the 
mode  is  switched  to  the  track  while  coast  mode.  Whenever  the 
transition  to  the  coast  while  track  mode  occurs,  several  key 
target  parameters  are  saved:  the  histograms  for  computing  the 
decision  rule,  the  target  size,  and  the  target  trajectory  (in 
the  form  of  the  Kalman  state  vector  estimate).  While  in  the 
track  while  coast  mode,  tracking  continues  normally,  but  now  a 
running  average  of  the  performance  measure  is  computed.  After 
at  least  six  frames  in  this  mode,  the  average  performance 
measure  is  compared  to  two  thresholds.  If  it  rises  above  the 
first  threshold,  then  the  assumption  is  made  that  the  cause  of 


performance  degradation  was  temporary  and  therefore  ignorable, 
thus  the  tracker  returns  to  normal  track.  If  on  the  other  hand, 
the  performance  measure  falls  below  the  second  threshold  then 
the  assumption  is  made  that  breaklock  has  occurred.  In  this 
case  the  tracker  enters  the  search  mode  to  try  to  reacquire  the 
target. 

The  search  for  the  target  is  made  within  the  binary  target 
image.  The  decision  rule  used  to  generate  the  target  image  is 
computed  using  the  histograms  that  were  saved  at  the  time  of 
initial  performance  degradation.  The  binary  mask  that  is 
correlated  with  the  target  image  is  shown  in  Figure  4.  The  size 
of  the  binary  mask  is  set  according  to  the  size  of  the  target. 
The  mask  shape  and  size  are  such  that  the  correlation  will  peak 
when  at  a  blob  of  target  pixels  of  about  the  correct  size. 
Successful  reacquisition  of  the  target  is  detected  by  comparing 
the  peak  correlation  to  a  threshold.  The  final  piece  of 
information  that  is  used  to  help  find  the  target  is  the  known 
target  trajectory.  The  trajectory  is  used  to  generate  a  penalty 
function  that  is  subtracted  from  the  spatial  correlation 
function.  The  effect  of  the  penalty  function  is  to  bias  the 
search  towards  the  predicted  target  position  and  to  limit  the 
region  searched.  The  search  region  is  steadily  increased  for 
every  frame  that  the  target  goes  undetected.  Eventually  (after 
a  couple  of  seconds)  the  search  region  encompasses  the  complete 
field  of  view. 


4.0  SUMMARY 


This  paper  has  attempted  to  describe  the  tracking  system 
that  has  been  developed  by  RCA.  Many  of  the  techniques  used 
were  a  result  of  the  contraint  to  be  able  to  process  video  in 
real-time.  The  use  of  histograms  to  estimate  the  probability 
densities  involved  is  one  example.  The  simple  performance 
measure  used  to  detect  breaklock  is  another. 

Also,  at  times  the  only  justification  for  a  design  decision 
is  human  intuition  or  experience  derived  through  the  simulator. 
An  example  of  such  a  design  tradeoff  was  the  decision  to  use  a 
modified  Neyman-Pearson  criteria  that  guaratees  at  least  a 
minimum  probability  of  detection  no  matter  what  the  false  alarm 
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Abstract 


A  statistically  based  tracking  algorithm  is  described  which  utilizes  a  powerful  segmenta¬ 
tion  algorithm.  Multiple  features  such  as  intensity,  edge  magnitude,  and  spatial  freguency 
are  combined  to  form  a  joint  probability  distribution  to  characterize  a  region  containing 
a  target  and  its  immediate  surround.  These  distributions  are  integrated  over  time  to  pro¬ 
vide  a  stable  estimate  of  the  target  region  and  background  statistics.  A  Bayesian  decision 
rule  is  implemented  using  these  distributions  to  classify  individual  pixels  as  target  or 
nontarget.  An  adaptive  gate  process  is  used  to  estimate  desired  changes  in  the  tracking 
window  size. 


Introduction 


This  paper  documents  progress  during  the  past  year  toward  the  development  and  demonstra¬ 
tions  of  a  statistical  tracking  algorithm.  Papers'' 2  presented  in  1981  described  some  of 
the  initial  concepts  in  this  development.  Since  that  time,  the  statistical  tracking  algo¬ 
rithm  has  been  expanded  to  incorporate  (a)  the  simultaneous  use  of  multiple  features,  (b)  an 
adaptive  gate  process  for  control  of  the  window  size,  and  (c)  positional  dependence  of  the 
misclassif ication  cost  factor. 

The  tracking  algorithm  is  based  on  the  use  of  multifeature  joint  probability  density 
functions  for  the  statistical  separation  of  targets  from  their  background.  The  features 
currently  being  used  are  intensity,  edge  magnitude,  and  a  pseudo  spatial  frequency  feature. 
These  features  are  combined  to  form  the  joint  distributions  which  characterize  a  target 
region  and  its  immediate  surround.  The  distributions  are  integrated  over  time  to  provide 
a  stable  estimate  of  the  target  and  background  statistics.  A  Bayesian  decision  rule  is  im¬ 
plemented  using  these  distributions  to  classify  individual  pixels  as  target  or  nontarget 
within  a  tracking  window.  An  adaptive  gate  process  is  used  to  estimate  desired  changes  in 
the  tracking  window  size.  The  algorithm  at  present  assumes  manual  target  designation. 

RCA  believes  this  tracking  process  is  capable  of  operation  in  all  environi  ents,-  insensi¬ 
tive  to  target  type,  signature,  and  orientation;  applicable  to  a  variety  of  sensors;  and 
extendable  to  multisensor  processing  and  readily  implementable . 

Preprocessing  and  A/D  conversion 

The  video  preprocessing  function  is  an  important  part  of  any  imaging  sensor  system,  but 
is  more  critical  when  the  sensor  is  an  IR  device  which  may  exhibit  very  high  dynamic  range 
capability.  In  this  case  it  is  insufficient  to  perform  a  simple  AGC  based  upon  global 
statistics  because  the  subsequent  rescaling  to  reduce  the  dynamic  range  will  destroy  the  low 
contrast  local  detail.  Instead,  some  form  of  local  adaptive  contrast  enhancement  should  be 
applied  in  which  the  gain  varies  with  the  local  contrast.  Lo3  simulated  and  compared  sever¬ 
al  such  techniques.  * 

Although  necessary  in  a  hardware  implementation,  this  function  has  not  been  included  in 
the  simulations  reported  here.  Ten-second  image  sequences  were  digitized  from  video  tape 
via  an  analog  video  disc  and  an  image  processing  system.  The  input  to  the  image  processing 
system  was  passed  through  a  video  processing  amplifier  so  that  the  levels  could  be  properly 
matched  to  the  A/D  converter. 

Statistical  tracking  algorithm 

Targets  are  often  separated  from  their  background  by  a  simple  thresholding  scheme.  Some¬ 
times  the  computation  of  the  threshold  is  quite  sophisticated  and  involves  looking  at  the 
statistics  of  the  video  signal.  However,  thresholding  is  inherently  limited  in  ability  as 
can  be  seen  by  the  diagrams  in  Fig.  1.  A  simple  black  and  white  target  can  be  readily 
thresholded  to  isolate  it  from  its  background.  On  the  other  hand  a  gray  target  cannot  be 
thresholded  without  using  a  pair  of  thresholds  properly  placed  to  contain  the  intensity 
levels  on  the  target.  This  dual  threshold  in  itself  is  not  prohibitive,  but  rather  the  prob¬ 
lem  lies  in  the  ability  to  place  the  thresholds  at  the  appropriate  levels. 
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INTENSITY 


INTENSITY 


(b)  THIS  TARGET  CANNOT  BE  EASILY  THRESHOLDED  BUT 
REQUIRES  A  PAIR  OF  THRESHOLDS  PROPERLY  PLACED. 


Fig.  1.  Example  showing  two  postulated  targets.  One  is  easily  segmented  from  the  back¬ 
ground  using  a  single  threshold.  The  other,  however,  requires  two  thresholds 
which  are  not  easily  determined.  The  statistical  process  provides  a  separate 
threshold  for  each  intensity  level. 

The  statistical  segmentation  process  is  a  technique  which  provides  an  improved  method  for 
extracting  the  target  from  its  background.  Figure  2  depicts  this  process.  Shown  are  two 
histograms,  one  taken  from  a  window  area  of  the  image  containing  the  target  and  the  other 
taken  from  the  immediate  surround  which  represents  the  background.  A  single  feature,  in¬ 
tensity,  is  shown  in  these  histograms  for  illustrative  purposes.  The  shape  of  the  dis¬ 
tribution  shown  is  arbitrary;  there  are  no  assumptions  made  about  their  actual  shape.  The 
segmentation  process  makes  a  separate  assessment  of  each  bin  in  the  histogram  to  determine 
if  pixels  whose  intensity  falls  in  the  bin  are  more  likely  to  be  target  or  background.  In 
addition  to  solvinq  the  threshold  selection  problem,  the  statistical  tracking  algorithm  pro¬ 
vides  a  method  to  both  simplify  the  multimode  tracking  concept  and  provide  added  capability. 

The  simplification  comes  about  in  the  following  way.  State-of-the-art  multimode  trackers 
typically  operate  a  contrast,  edge,  and  correlation  tracker  in  parallel.  An  executive 
process  may  be  defined  to  determine  at  any  given. time  which  tracking  mode  is  providing 
the  most  reliable  estimate  of  target  position.  The  statistical  process,  as  currently  de¬ 
fined,  eliminates  this  mode  polling  process  by  combining  the  available  features  into  multi¬ 
dimensional  statistics  representing  target  and  background.  Consider  the  use  of  intensity 
and  edge  magnitude  as  the  two  candidate  features.  In  this  case  the  statistical  approach 
encompasses  three  tracking  modes  in  an  integrated  single  mode  without  the  need  to  poll  the 
performance  of  the  individual  processes.  When  intensity  is  the  best  target  background 
separator,  the  algorithm  operates  like  a  contrast  tracker.  When  edge  magnitude  is  pre¬ 
dominate  it  operates  similar  to  an  edge  centroid  tracker.  Because  the  process  is  searching 
for  pixels  in  the  current  frame  that  are  statistically  similar  to  those  pixels  selected  as 
target  in  previous  frames,  the  algorithm  is  in  a  sense  a  correlation  type  process  as  well. 

The  added  capability  comes  from  the  fact  that  there  are  target/background  conditions 
which  are  inseparable  using  two  features  independently  but  are  readily  separable  using  the 
same  two  features  jointly.  This  is  illustrated  quite  simply  in  Fig.  3.  In  this  example, 
neither  edqe  magnitude  nor  intensity  can  be  used  independently  to  separate  the  target  from 
background  because  both  flat  distributions  cover  the  entire  variable  range  for  both  features. 
On  the  other  hand,  the  joint  distribution  clearly  delineates  the  two  areas. 
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NUMBER  OF  PIXELS  IN  EACH  INTENSITY  GROUP 


Fig.  2.  Example  of  how  histograms  are  used  to  separate  a  target  from  its  background.  Each 
bin  in  the  histogram  is  examined  to  determine  if  the  intensity  value  falling  within 
that  bin  are  more  likely  to  be  target  or  background.  Although  this  is  a  single 
feature  (intensity)  example,  the  same  process  is  used  with  multiple  features  in  an 
N-dimensional  histogram  representing  a  joint  probability  density. 


IAI  JOINT  DISTRIBUTION  OF  INTENSITY 
AND  EDGE  MAGNITUDE  FOR  A 
POSTULATED  TARGET/BACKGROUND  SCENE 


EDGE  MAGNITUDE 
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ITARGET  AND  BACKGROUND  DISTRIBUTIONS 
LOOK  ALIKE) 


Fig.  3.  Simple  example  showing  how  the  use  of  joint  statistics  aids  in  the  separation  of 
target  from  background  in  situations  where  the  use  of  the  features  singly  fails. 

Figure  4  is  a  flow  diagram  of  the  statistical  tracking  mode.  The  preprocessed  video  is 
used  to  generate  multiple  feature  images  to  be  used  in  the  decision  process.  The  features 
are  combined  into  two  joint  probability  density  functions  for  (a)  a  target  tracking  window 
and  (b)  a  background  window  frame.  These  distributions  are  the  basis  of  a  statistical 
decision  process  which  is  used  to  classify  the  image  pixels  inside  the  tracking  window  to 
separate  the  target  from  the  background.  In  actuality  the  statistics  from  previous  frames 
are  used  in  the  classification  process  for  the  current  frame.  At  the  same  time,  histo¬ 
grams  are  generated  from  the  current  image  frame  so  that  the  statistics  can  be  updated  for 
processing  subsequent  frames.  At  the  end  of  the  classification  process  the  segmented  lmaue 
is  analyzed  to  determine  the  appropriate  error  signals  as  well  as  the  window  size  and  posi¬ 
tion  for  the  next  frame.  In  parallel  with  the  pixel  rate  computations  for  the  Nth  frame, 
the  statistics  from  the  N-lst  frame  are  integrated  with  past  history  and  a  decision  rule  is 
generated  for  the  N+lst  frame. 

A  sample  output  from  the  process  is  shown  in  Fiu.  5.  Only  two  features  were  used  :  i 
this  example,  namely,  intensity  and  edge  magnitude.  The  total  number  of  bits  utilized  for 
the  features  is  seven  —  four  for  intensity  and  three  for  edge  magnitude.  The  edac  magni¬ 
tude  used  is  the  absolute  value  approximation  to  the  Sobel  operator. 

The  next  few  paragraphs  describe  some  of  the  steps  in  this  process  in  more  detail. 

Commutation  of  features 

The  first  step  in  the  statistical  process  is  the  generation  of  the  eatures  to  in-  ue\:  . 
There  are  many  potential  candidates,  some  of  which  are  computationally  too  burdonsoru  :.i 
real-time  implementation  at  this  time.  We  therefore  have  limited  our  selection  of  features 
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’REPROCESSED  VIDEO 


ERROR  SIGNALS 


Fig.  4.  Block  diagram  of  the  Bayesian  statistical  tracking  mode.  The  feature  computation, 
statistics  generation,  and  pixel  classification  are  performed  at  the  pixel  rate. 
The  computation  of  error  signals  is  performed  during  vertical  sync. 
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Fig.  5.  Sample  output  from  the  Bayesian  statistical  tracker  simulation  using  a  64-X-64 

pixel  image  of  an  aircraft  at  a  mountain  boundary.  Two  features  were  used  in  the 
statistical  segmentation  with  a  total  of  .seven  bits. 

to  those  which  are  readily  implemented.  These  features  are  intensity,  edge  magnitude,  and 
spatial  frequency. 

The  intensity  feature  is  simply  a  requantized  version  of  the  digitized  video  signal  to 
obtain  the  desired  number  of  bits  of  intensity  resolution.  The  edge  magnitude  feature  is 
the  sum  of  absolute  values  approximation  to  the  Sobel  operator.  The  absolute  sum  is  an 
acceptable  and  computationally  more  appealing  approximation  than  the  true  edge  magnitude. 

The  third  feature  is  an  approximation  to  spatial  frequency  in  the  horizontal  direction. 
Because  it  is  a  measure  of  object  size,  it  could  also  be  considered  a  simple  texture 
measure  in  a  broad  sense.  The  spatial  frequency  is  defined  as  the  function  of  the  run 
length  where  a  run  is  the  number  of  consecutive  pixels  between  which  the  pixel-to-pixel 
difference  does  not  exceed  a  predefined  threshold.  The  threshold  used  is  the  mean  value  of 
the  absolute  difference  beweeen  pixels  in  the  previous  frame.  The  feature  value  is  then 
defined  as:  _ 

SF  =  MAXIMUM  [o,  ( 2N  -  RUN  LENGTH)  J  (J) 

where  2^  is  the  number  of  levels  into  which  the  spatial  frequency  feature  will  be  quantized. 
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An  example  of  the  spatial  frequency  feature  is  shown  in  Fig.  6.  An  arbitrary  function 
is  plotted  to  represent  the  image  intensity  I  at  successive  pixels  in  the  x  direction. 
Beneath  the  plotted  data  are  shown  the  actual  pixel  intensities,  absolute  differences,  run 
lengths,  and  feature  values.  The  threshold  used  to  compute  run  lengths  in  the  example 
is  1.3  and  the  number  of  quantization  levels  is  8 (N  -  3  bits).  The  first  sample-to-sample 
difference  which  exceeds  the  threshold  1.3  is  the  sixth  sample.  Samples  1  to  5  represent 
a  run  of  length  5  in  which  the  differences  do  not  exceed  threshold.  The  corresponding 
feature  value  is  3  which  is  assigned  to  all  pixel  locations  in  the  run.  The  higher  feature 
values  indicate  smaller  distances  between  gradient  values  exceeding  threshold.  Note  that 
the  low  amplitude  variation  between  the  pixels  6  and  14  do  not  exceed  the  threshold  and 
therefore  do  not  define  the  boundary  of  a  run.  The  feature  is  intended  to  provide  informa¬ 
tion  about  the  size  (in  the  x  direction)  of  areas  or  patches  which  have  uniform  or  slowly 
varying  intensity. 


Generation  and  integration  of  statistics 

Histograms  from  two  separate  regions  in  the  image  must  be  computed  to  provide  the  prob¬ 
ability  density  functions  required  by  the  decision  rule.  The  regions  from  which  the  histo¬ 
grams  are  generated  are  shown  in  Fig.  7.  The  assumption  in  the  segmentation  algorithm  is 
that  the  target  is  absent  from  the  frame  region.  For  both  the  frame  and  window  regions  a 
multifeature  histogram  is  defined  as 

N 

HpR  (fj,  f2,  f^)  Frame  Region  Histogram 
N 

Hwr  (f^,  f2<  f 3 )  Window  Region  Histogram 

for  the  Nth  image  in  the  sequence. 

After  normalization  by  the  respective  areas  of  the  frame  and  window  regions  the  histo¬ 
grams  become  the  discrete  joint  probability  densities 


,N 

FR 

<fr 

CN 

4-4 

f3> 

,N 

WR 

(£i- 

’  f  2 ' 

f3> 

Fig.  6.  Sample  which  shows  the  procedure 

for  calculating  the  pseudo  spatial 
frequency  feature.  The  absolute 
difference  threshold  used  to  compute 
run  lengths  in  the  example  is  1.3, 
which  is  the  average  difference.  The 
number  of  quantization  levels  for 
the  feature  is  8. 


Fig.  7.  Areas  of  the  image  over  which  the 
multifeature  histograms  are  com¬ 
puted.  It  is  assumed  that  the 
target  is  absent  from  the  frame 
region  which  is  defined  as  a 
border  around  the  window  region 
containing  the  target. 
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To  minimize  short-term  statistical  variations  these  probability  densities  are  combined 
in  a  weighted  sum  with  the  past  history  of  the  statistics.  This  fading  memory  filtering 
is  performed  once  each  frame  so  that  the  statistical  updating  keeps  up  with  the  frame  rate 
of  the  video.  The  filtering  is  defined  by 
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filtered  probability  density  functions  at  the  Nth  frame 

PFR'  PWR  are  the  unfiltered  density  functions  computed  from  the  current  frame 


a,  b  are  the  weighting  factors  which  control  the  amount  of  smoothing 

performed. 

In  the  simulations  performed  to  date,  the  filtered  statistics  up  to  and  including  frame 
N-l  are  used  to  generate  the  decision  rule  to  be  used  on  frame  N. 

Minimal  cost  decision  rule 

The  decision  rule  used  in  the  classification  of  pixels  as  target  or  background  inside  the 
window  region  is  based  upon  minimizing  the  cost  or  risk  associated  with  making  a  particular 
choice.  A  pixel  is  called  a  target  pixel  if,  and  only  if,  the  cost  (or  penalty)  associated 
with  deciding  background  is  greater  than  the  cost  associated  with  deciding  target.  Mathe¬ 
matically  this  is  written  as  follows: 

Decide  target  if  and  only  if 

P (B/X) C (B/B)  +  P (T/X) C (B/T)  >  P(B/X)C(T/B)  +  P(T/X)C(T/T)  (4) 

where 

P (B/X)  is  the  probability  of  a  pixel  being  background  given  that  the  pixel  has  the 
feature  vector  X. 

P(T/X)  is  the  probability  of  a  pixel  being  target  given  that  the  pixel  has  the  feature 
vector  X. 

C(B/B)  is  the  cost  associated  with  classifying  a  background  pixel  as  background. 

C(T/T)  is  the  cost  associated  with  classifying  a  target  pixel  as  target. 

C  (B/T)  is  the  cost  associated  with  classifying  a  target  pixel  as  background. 

C(T/B)  is  the  cost  associated  with  classifying  a  background  pixel  as  target. 

Clearly  C(B/B)  and  C(T/T)  are  zero  because  there  should  be  no  penalty  for  making  a  correct 
decision.  The  decision  rule  then  becomes  the  following: 

Decide  target  if  and  only  if 

P  (T/X)  C  (B/T)  >  P  (B/X)  C  (T  /  B )  .  (5) 

Using  Bayes  theorem  this  inequality  is  rewritten  as 

P(X/T)P(T)C(B/T)  >  P(X/B)P(B)C(T/B) .  (6) 

These  distributions  can  be  expressed  in  terms  of  the  window  and  frame  regions  shown  ir. 
Fig.  7.  Because  the  basic  assumption  is  that  the  target  is  absent  from  the  frame  region, 
the  background  distribution  is  the  same  as  the  frame  region  distribution,  therefore, 

P (X/FR)  =  P (X/B)  .  (~) 

The  window  region  contains  both  target  and  background,  therefore 

P (X/WR )  =  P (B) P (X/B)  +  P (T) P (X/T)  (SI 
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P(B),  P(T)  =  the  a  priori  probability  of  background  and  target  respectively  in  the 
window  region. 

Substitution  of  P(X/B)  and  P(X/T)  results  in 

P  (X/WR)  >  (T/B>  (^,(B/T)]p  (B)  P  (X/FR)  (9) 

which  is  a  decision  rule  based  upon  the  measurable  distributions  in  the  frame  and  window 
regions.  This  is  more  simply  written  as 

P (X/WR)  >  aA  P(X/FR)  (10) 

where 

a  =  P  (B) 

C  (T/B)  -f  C  (B/T) 

C (B/T) 

The  parameters  A  and  a  play  an  important  role  in  the  overall  process.  In  the  operation  of 
the  tracking  algorithm,  a  is  maintained  approximately  constant  by  attempts,  to  maintain  a 
fixed  relationship  between  the  size  of  the  target  and  the  window  size.  In  subsequent  para¬ 
graphs  it  will  be  shown  how  the  parameter  A  is  used  both  as  a  control  parameter  as  well  as 
a  means  of  introducing  pixel  position  into  the  decision  process.  The  parameter  A  is  referred 
to  as  the  misclassif ication  cost. 

Window  size  and  position  control 

The  dynamic  control  of  the  window  size  is  accomplished  using  one  of  the  most  successful 
techniques  of  modern  trackers,  namely  the  adaptive  gate  process.  This  provides  a  means  of 
increasing  or  decreasing  the  size  of  the  window  as  the  target  size  changes  due  to  target 
motion  or  range  closure.  The  approach  being  used  places  the  appropriate  edge  gates  inside 
the  statistical  tracking  window  as  shown  in  Fig.  8.  The  central  area  defined  by  the  heavy 
black  lines  is  the  area  in  which  the  segmented  target  is  confined.  For  ideal  operation 
the  edge  gates  would  contain  half  background  and  half  target  pixels.  The  adaptive  gate 
process  is  an  attempt  to  balance  the  number  of  target  and  background  pixels  in  the  hori¬ 
zontal  and  vertical  edqe  gates  independently  to  control  the  height  and  width  of  the  window. 
The  unbalance  in  the  edge  gates  is  the  difference  between  the  number  of  target  pixels  and 
background  pixels.  To  control  the  horizontal  window  size  this  unbalance  is  used  to  either 
expand  or  contract  the  window  size  horizontally.  If  '.Wr  as  defined  in  Fig.  8  is  positive, 
there  is  more  target  area  than  background  area  in  the  edge  gate  regions.  The  gates  are 
then  expanded.  Conversely,  if  AWE  is  negative  there  is  more  background  area  which  suggests 
that  the  gates  should  contract.  The  effect  is  to  make  a  change  in  the  width  of  the  central 
region  in  which  the  target  is  being  contained. 
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Recall  in  the  previous  para¬ 
graph  it  was  stated  that  a  (the 
a  priori  probability  of  back¬ 
ground  in  the  window  region) 
could  be  maintained  approximate¬ 
ly  constant.  This  is  accomp¬ 
lished  during  tne  window  size 
control  process.  If  AWE  is  the 
desired  change  in  width  of  the 
central  target  area  as  derived 
from  the  edge  gates,  then  the 
corresponding  change  in  the 
statistical  tracking  window 
width  W  is  defined  as 


Fig.  8.  Target  edge  gates  are  located  inside  the  window 
area.  The  change  awe  in  the  combined  width  of 
the  two-edge  gates  is  related  to  the  background 
vs.  target  area  unbalance  within  the  gates. 

This  change  then  defines  the  change  AW  to  be 
made  in  the  tracking  window  width  W  to  maintain 
a  constant  nominal  value  for  u.  A  similar  proc¬ 
ess  defines  the  change  AH  in  the  window  height. 


AW. 


AW  = 


(1-a) 


1/2 


(11) 


A  similar  process  defines  the 
change  in  the  window  height. 
This  process  tries  to  maintain 
a  central  target  area  which  is 
(1-a)  times  the  window  area. 
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The  probability  of  background  inside  the  window  is  then  approximated  by  a. 

The  relative  position  of  the  tracking  window  with  respect  to  the  image  frame  is  con¬ 
trolled  in  one  of  two  ways.  In  the  case  of  bench  test  video  tape  input  (or  in  fact  any 
input  which  is  not  directly  controlled  by  the  error  signals)  the  error  signals  must  drive 
the  position  of  the  tracking  window  for  the  next  frame.  On  the  other  hand  when  the  track¬ 
ing  error  signals  are  driving  a  seeker,  the  window  position  within  the  frame  will  vary  only 
in  certain  circumstances  such  as  the  initiation  of  a  search  and  reacquisition  strategy. 

The  error  signals  are  computed  as  the  difference  between  the  centroid  of  the  segmented 
target  region  and  the  current  position  of  the  window  in  the  frame. 

Misclassif ication  cost  control 

The  misclassif ication  cost,  A,  is  controlled  in  two  ways.  First,  it  is  a  function  of  the 
pixel  position  relative  to  the  expected  aim  point  and  second,  it  is  adjusted  in  a  control 
loop  using  the  parameter  a  as  a  reference.  This  in  effect  puts  a  positional  dependency  in¬ 
to  the  decision  rule  so  that  the  classification  of  a  pixel  as  a  target  point  is  a  function 
of  its  location  in  the  image  relative  to  the  current  best  estimate  of  target  location.  The 
adaptive  gate  mechanism  provides  an  ideal  method  for  assigning  relative  weights  to  the  two 
cost  factors  C(T/B)  and  C(B/T).  Figure  9  shows  the  layout  of  the  frame  and  window  regions 
along  with  a  plot  in  the  X  direction  of  the  relative  magnitude  of  the  composite  cost  func¬ 
tion  A.  The  area  labeled  R1  in  the  figure  should  correspond  to  the  central  area  of  the 
target  assuming  the  adaptive  gate  window  control  function  is  performing  properly.  In  this 
region  we  expect  mostly  target  pixels.  The  penalty  for  mislabeling  a  background  pixel  as 
target  in  area  R1  therefore  should  be  less  than  the  penalty  for  calling  a  target  pixel  back¬ 
ground.  In  the  edge  gate  regions  R2  there  should  be  half  background  and  half  target  pixels 
in  the  ideal  case;  therefore  the  misclassif ication  costs  C(T/B)and  C(B/T)  should  be  equal. 

In  the  area  R3  between  the  edge  gates  and  the  window  boundary  very  little  target  area  is 
anticipated.  This  provides  a  buffer  zone  between  the  target  and  the  frame  region  which  is 
assumed  to  contain  no  target  data.  Consequently,  the  relative  magnitudes  of  the  misclassi- 
f ication  costs  should  reverse.  Finally,  because  the  frame  region  should  not  contain  any 
target  pixels,  the  penalty  for  misclassifying  background  pixels  as  target  in  this  region 
should  be  even  higher. 

A  similar  function  is  applied  in  the  Y  direction  and  the  actual  misclassification  factor 
is  the  larger  of  the  two.  This  does  not,  however,  set  the  actual  magnitude  of  A  which  is 
required  for  given  image  conditions.  Consequently  the  overall  amplitude  of  the  parameter  A 
is  made  adaptive.  This  is  done  in  the  following  way. 

The  two  parameters  in  the  decision  rule  which  impact  the  pixel  classification  in  addition 
to  the  statistics  are  the  misclassification  cost  A  and  the  a  priori  probability  of  back¬ 
ground  in  the  tracking  window  a.  It  is  desirable  to  hold  a  at  a  constant  value  inside  the 
window.  Therefore,  ( 1  —  a )  is  used  as  the  reference  parameter  in  a  simple  control  loop  as 
shown  in  Fig.  10.  After  classifying  the  pixels  in  an  image  frame,  the  adaptive  gate 
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Fig.  9.  Diagram  showing  the  positional  de-  Fig.  10.  Control  loop  used  to  adjust  the 
pcndcnce  of  the  misclassification  misclassification  cost  used  in 

cost.  A  plot  of  the  relative  am-  frame  n  (An)  to  a  new  value  to  be 

plitude  of  "A"  as  a  function  of  the  used  for  frame  n+1  (An+1) . 

pixel  position  is  shown  for  the  X 
direction . 


SPIF  Vo t  3S9  Applications  of  Digital  Image  Processing  IV 11982/  75 


process  is  used  to  set  the  window  size  for  the  next  frame.  The  number  of  pixels  expected 
to  be  classified  as  target  in  the  next  frame  is  estimated  to  be  the  same  as  the  current 
frame.  Using  the  estimated  target  size  and  the  window  size  calculated  for  the  next  frame, 
the  percentage  target  area  expected  in  the  next  frame  can  be  computed.  The  predicted  target 
area  is  compared  to  the  desired  reference  ( 1 — a )  to  obtain  an  error  which  defines  a  scale 
factor  by  which  the  cost  function  A  is  scaled.  The  adjustment  in  a  will  tend  to  improve 
the  classification  in  the  next  frame.  Lowering  the  magnitude  of  A  will  cause  more  pixels 
to  be  classified  as  target  and  vice  versa. 

Conclusions 

A  statistical  tracking  algorithm  has  been  demonstrated  via  simulation  which  incorporates 
the  concepts  of  a  multimode  tracker  in  a  single  mode.  The  use  of  multifeature  joint 
probability  distributions  provide  better  target  separation  than  using  the  same  features 
individually.  The  statistical  process  is  insensitive  to  sensor  type  and  operating  scenario 
which  provides  a  wide  range  of  applicability  without  the  need  for  application  dependent 
training.  The  technique  is  also  insensitive  to  target  orientation  (such  as  that  caused 
by  platform  roll)  because  no  specific  target-related  information  is  assumed.  The  algorithm 
is  directly  extendable  to  multisensor  operation  which  would  provide  a  wider  range  of  opera¬ 
ting  conditions. 
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Abstract 


Performance  measures  for  statistical  segmentation  have  been  developed  for  a  space-and-time 
critical  Bayesian  statistical  tracker.  They  are  intended  to  become  an  integral  part  of  a 
knowledge-based  tracking  algorithm,  which  has  been  developed  by  RCA.  The  performance 
measures  are  serving  to  quantify  the  usefulness  of  the  processed  input,  to  assist  in  the 
identification  of  each  tracking  state  and  give  its  reliability,  and  to  predict  impending 
changes  of  state.  They  have  been  tested  using  stochastically  generated  target-background 
frames.  Performance  measure  results  have  correlated  well  with  the  parameters  which  char¬ 
acterize  the  difference  in  the  target  and  background  distributions.  A  host  of  possible 
performance  measures  are  discussed  in  relation  to  their  strengths  and  weaknesses.  Experi¬ 
mental  results  for  the  measures  currently  being  employed  by  RCA  are  given,  and  areas  for 
future  research  are  indicated. 


Introduction 

Space-and-time  constrained  tracking  algorithms  have  suffered  from  a  lack  of  intelligence. 
Recent  work  on  a  statistical  approach  to  a  2-dimensional  image  segmentation  using  Bayesian 
decision  criteria  has  given  hope  that  this  deficiency  can  be  remedied.! >2  During  the  past 
two  years  the  RCA  Advanced  Technology  Laboratories  has  had  considerable  success  in  the 
development  of  a  Multifeature  Bayesian  Intelligent  Tracker  (MFBIT) . '  An  intelligent  tracker 
must  adapt  to  rapidly  changing  tracking  conditions.  MFBIT  is  intended  to  handle  a  variety 
of  tracking  strategies  which  are  dependent  on  the  tracking  conditions.  In  addition,  it  must 
identify  tracking  conditions  and  impending  changes  in  them.  A  knowledge-based  tracking 
processor  is  being  implemented  which  responds  to  the  current  input  based  upon  a  time-series 
record  of  measures  extracted  from  previous  inputs.  The  tracking  conditions  are  implemented 
as  a  finite  state  automaton.  The  anticipated  tracking  states  are  (1)  target  acquisition, 

(2)  multiple  targets,  (3)  breaklock,  (4)  target  leaving  field-of-view,  and  (5)  target  con¬ 
fusion.  The  performance  measures  have  been  developed  to  add  to  the  intelligence  of  MFBIT. 
They  will  be  utilized  (1)  to  predict  and  identify  changes  of  tracking  state,  (2)  as  an 
experimental  instrument  to  define  each  tracking  state,  (3)  in  a  near-ODtimal  allocation 
scheme  of  computational  resources  to  the  competing  features,  and  (4)  in  the  determination  of 
Kalman  weights  by  which  the  present  input  is  integrated  with  the  past  to  reduce  wild  oscil¬ 
lations  in  tracking  strategy. 


Picture  segmentation 

Picture  segmentation  is  a  procedure  which  locates  the  target  and  background  domains  of  an 
image.  Each  image  is  initially  digitized,  to  form  a  2-dimensional  array  of  pixels.  Using 
appropriate  operators,  features  such  as  intensity,  edge  magnitude,  texture,  etc.,  can  be 
extracted.  Based  on  its  feature  values,  each  pixel  is  assigned  to  the  states-of-nature 
’’target"  or  "background."  A  Bayesian  decision  rule  determines  the  set  of  feature  values 
JT  and  'J(jl  which  causes  the  pixels  to  be  assigned  the  state-of-nature  "target"  or  back¬ 
ground,"  respectively. 

The  Bayesian  decision  rule  is  obtained  by  making  the  assumption  that  the  set  of  pixels 
belonging  to  the  state-of-nature  "target"  will  have  a  different  statistical  distribution  of 
feature  values  than  pixels  belonging  to  the  state-of-nature  "background."  Two  regions  are 
therefore  defined  for  each  frame  in  a  sequence  of  images:  a  rectangular  window  region  (WR) 
containing  pixels  which  belong  to  both  states-of-nature,  enclosed  in  a  frame  region  (FR) 
whoso  pixels  have  the  states-of-nature,  "background"  only  (see  Fig.  1).  We  obtain  the  prob¬ 
ability  density  histograms,  hFR(J),from  the  frame  region.  hFR(J)  corresponds  to  the  condi¬ 
tional  probability,  P(J/BR),  that  if  the  pixel  is  from  the  background  region,  then  the 
chance  variable  will  have  the  feature  value  J.  The  window  region  probability  density  histo¬ 
gram,  hWK(J),  is  obtained.  This  probability,  P(J/WR),  that  the  chance  variable  will  have  a 
feature  value  J  in  the  window  region,  is  the  expectation  value,  summed  over  both  states-of- 
nature,  of  the  conditional  probability  of  feature  value  J. 
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Fig.  1.  The  window  region  contains  both  target  and  background 

pixels.  The  frame  region  contains  background  pixels  only. 

Because  only  the  window  region  is  segmented,  P(J/WR)  is  simply  written 
as  P (J) .  Then, 

P(J)  =  P (T) P ( J/T)  +P  (B)  P  ( J/B)  HI 

where  P(T)  is  the  a  priori  probability  of  the  state-of-nature  "target"  in  the  WR  and  P(BI  is 
the  a  priori  probability  of  the  state-of-nature  "background"  in  the  WR.  For  the  sake  of 
simplifying  notation,  P(B)  u,  PIT)  1  -a,  P(J/T)  hT(J),  and  P(J/B)  hB(J).  If  we  now 

assume  that  hB(J)  =  hFR(J),  and  that  we  can  reasonably  estimate  u  then  equation  1  can  be  used 
to  solve  for  hT(J) 


hT(J) 


_J_  jhWR(J)  -ahFR(J) 


(2) 


Bayes  theorem  allows  a  calculation  of  a  posteriori  probabilities  from  a  knowledge  of  the 
a  priori  probabilities. 


P(T/J) 


P (T) P (J/T) 
P(J) 


( 1  -  a ) h  ~  (J) 
hWR(J) 


(3) 


and 


P(B/J) 


P (B ) P ( J/B) 
P<J> 


ghFR(J) 

hWR(J) 


(4) 


The  a  posterior  probabilities  are  used  in  decision  rules.  Several  are  extant.  The  rule 
that  a  pixel  whose  feature  value  is  J  will  be  labeled  "target”  if 

P (T/J)  >  P(B/J) 

otherwise  it  will  be  labeled  "background"  is  known  as  the  maximum  likelihood  rule.  Using 
equations  3  and  4  we  may  restate  the  rule.  Label  the  pixel  "target"  if 


hT (J)  g  (6) 

hFR(J)  1  -g 

In  the  1920s  it  was  shown  by  Neyman  and  Pearson'!  that  optimum  decision  rules  are  formulated 
in  terms  of  the  ratio  (P (J/T) /P (J/B) .  This  ratio,  hT(J)/hFR(J>  for  the  tracker,  is  known 
as  the  likelihood  ratio.  The  value  it  must  exceed  (i.e.  •  / 1  —  » )  is  called  the  decision 

criterion.  The  maximum  likelihood  rule  guarantees  that  the  majority  of  the  pixels  are 
correctly  labeled. 

Another  Bayesian  decision  rule  is  known  as  the  minimum  risk  rule.  The  risk  in  labeling 
the  pixels  "target"  or  "background"  is  calculated  by  defining  the  mi  sc  1 assi f icat ion  costs 
C(B/T)  and  C(T/B).  C(B/T)  is  the  cost  of  labeling  "background"  a  pixel  whose  state-of- 
nature  is  "target,"  and  C(T/B)  is  the  cost  of  labeling  "target"  a  pixel  whose  state-cf- 
nature  is  " background . "  The  Bayesian  risk  in  labeling  "target"  a  pixel  whose  featuie  value 
is  J  is  R(T/J,B)  =  C (T / B)  P ( B )  P ( J / B ) .  Likewise,  the  Bayesian  risk  in  labeling  a  pixel 
whose  feature  is  J  as  "background"  is  R(B/J,  T)  =  C(B/T>  PIT)  P(J/T).  The  Bayesian  decision 
rule  is  to  label  a  pixel  to  minimize  the  risk.  This  leads  to  the  rule:  a  pixel  whose 
feature  value  is  J  will  be  labeled  "target"  if 

C(T/B)P(B)P(J/B) <C  IB/TIP (TIP (J/T)  <7> 


otherwise,  label  the  pixel  "background." 

Equation  7  demands  that  a  pixel  be  labeled  "background"  unless 


hT(J) 

Q 

C (T/B) 

hFR(J) 

1  -  a 

C (B/T ) 

. 

. 

We  can  rewrite  equation  8  by  using  equation  2. 
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hWR(J) 

hFR(J) 


C  (B/T)  +  C (T/B) 
C  (B/T)  “ 


define  A 


C  (B/T)  +  C  (T/B) 
C (B/T) 


Thus,  the  decision  criterion  for  a  Bayesian  classifier  is  Aa . 

It  is  of  interest  to  note  that  the  minimum  risk  and  maximum  likelihood  rules  are  identi¬ 
cal  for  equal  misclassif ication  costs  (i.e.,  C(T/B)  =  C(B/T)). 

Estimation  of  performance 

The  Bayesian  decision  rules  partition  the  set  of  feature  values  {Ji  into  two  mutually 
exclusive  sets,  (J^p)  and  (JB).  If  a  pixel  in  the  window  region  has  a  feature  value  which 
belongs  to  the  set  (Jp),  it  will  be  labeled  "target,"  regardless  of  its  state-of-nature . 

In  a  like  manner,  if  the  pixel  has  a  feature  value  belonging  to  the  set  -Jr:,  it  will  be 
labeled  "background."  Because  the  histograms  hT(J)  and  hFR(J)  usually  overlap,  this  labeling 
scheme  will  lead  to  the  misclassif ication  of  some  pixels.  The  magnitude  of  the  error,  i.e., 
the  number  of  pixels  misc lassif ied ,  will  depend  on  the  amount  of  overlap  and  on  the  value 
of  the  decision  criterion,  Aa .  The  number  of  misclassif ied  pixels  is  a  minimum  if  A  =  2, 
the  maximum  likelihood  decision  rule.  However,  other  values  of  A  may  minimize,  in  some 
fashion,  the  risk  of  misclassif ication.  In  any  case,  our  choice  of  the  decision  criterion 
reflects  our  bias.  Thus,  performance  measures  which  take  the  partitioning  of  the  set  ■  J 
into  account  will  be  called  biased  performance  measures.  Performance  measures  which  compare 
the  shapes  or  functional  forms  of  the  histograms,  hT(j)  and  hFK(J) ,  will  be  called  un¬ 
biased  performance  measures.  Both  types  of  estimations  are  useful. 


Biased  performance  measures 


Hit  rate  and  false  alarm  rate 


Assume  a  decision  rule  divides  observation  space  into  two  disjoint  sets,  •. JT  and  JB 
as  in  Fig.  2. 


It  can  be  seen  that  h  (j)  =  P(T/B)  is  the 

•  Jrp  ' 

probability  that  a  pixel  state-of-nature  is  “back¬ 
ground"  is  labeled  "target.”  Alternatively, 

P  (B/T)  =  :  h'1’  ( j )  is  the  probability  that  a  pixel 

J  B 

whose  state-of-nature  is  "target”  is  labeled 
"background."  P(T/T)  =  :  hT(J)  =  1  -  P(B  T )  is 


Fig.  2.  Probanility  distribution  the  probability  that  a  pixel  whose  state-of-nature 

histograms.  is  "target"  will  be  labeled  "target."  P(T.T)  is 

called  the  hit  rate  (HR).  P(B/T)is  the  miss  rate  (MR).  P(T/B)  is  the  false  alarm  rate 
(FAR),  and  P(B/B)  is  called  the  correct  rejection  rate  (CRR) .  Because  we  locate  the  cen¬ 
troid  of  the  target  by  operating  in  some  manner  on  the  pixels  labeled  "target,"  it  is  evi¬ 
dent  that  a  false  alarm  is  generally  more  costly  than  a  miss.  Note  that  HR  and  FAR  cannot 
be  independently  varied.  From  Fig.  2  it  is  seen  that  they  depend  on  each  other  implicitly 
through  the  decision  criterion. 

Bayes  risk,  error 

We  can  calculate  the  total  risk  of  mislabeling  pixels  by  calculating  the  risk  of  mis¬ 
labeling  background  pixels. 


R  (T/B)  = 


R(T/J,B)  =  aC(T/B)  (FAR) 


and  adding  to  it  the  risk  of  mislabeling  target  pixels 

R (B/T)  =  :  R(R/J,T)  =  (1-.)  C  ( R  T )  ( 1  - UK ) 

'•  J  B 
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The  sum  is  known  as  Bayes  risk: 


Bayes  =  R(T/B>+  R ( B/T )  =  uC(T/B)(FAR)  +  (1-a)  C(B/TI(MR)  (12) 

The  total  number  of  misclassif ied  pixels  will  be  called  ERROR: 

ERROR  =  N  [a  (FAR)  +  (1-a) (MR)]  (13) 


where  N  t  total  number  of  pixels  in  the  window  region. 

ERROR  is  minimized  for  the  maximum  likelihood  conditions  C(T/B)  =  C(B/T)  .  The  ERROR  =  N 
times  the  shaded  area  under  the  graph  (see  Fig.  3).  If  the  decision  criterion  is  moved  t 
the  right,  the  decrease  in  false  alarms  is  smaller  than  the  increase  in  the  misses.  If  t 
decision  criterion  is  moved  to  the  left,  the  decrease  in  misses  is  smaller  than  the 
increase  in  false  alarms. 


Another  performance  measure  which  is  a  function 
of  the  HR  and  FAR  is  the  ratio  of  densities  (ROD): 


ROD 


HR 

FAR 


T 

h1  ( J) 

■  J^.  ■ 

hFR  (J) 


Jrp  I 


P(T/T) 
P (T/BI 


(14) 


Fig.  3.  Probability  density  distribution. 


Because 

h  (JI  5  [r^r]  [c  ( b  /t  j  ] 

hFR(J)  if  J, 

•>  J»p 

£ 

£  hFR(j> 

:  JT  ■ 

’  J 

■■■*»*  |*][§8fS] 

If  ROD  is  large  relative  to  j g~/Y~j|,  the  Bayesian  classifier  is  performing 


It  would  obviously  be  helpful  in  the  accurate  location  of  the  target,  if  the  number  of 
false  alarms  were  a  small  fraction  of  the  total  number  of  pixels  labeled  "target."  Thcie- 
fore,  a  performance  measure  called  the  false  alarm  fraction  (FAF)  has  been  del  mod: 

hFR,  ,, 
a.  h  (J) 


FAF 


Fraction  of  false  alarms' _ 

Fraction  of  pixels  labeled  target 


a (FAR) 

a (FAR+ (1-a) HR) 


hwR(J) 

'  ■1T 


< :  ■ ) 


In  Fig.  3,  if  we  move  the  decision  criterion  to  the  right  one  bin,  the  total  number  of 
errors  increase,  but  the  FAF  decreases.  We  probably  will  locate  the  target  centroid  more' 
accurately . 


Unbiased  performance  measures 


These  measures  compare  the  histograms  hT(J)  and  hf^U),  independently  of  the  choice  of  the 
decision  criterion.  However,  if  these  unbiased  measures  show  that  hT  (3 )  and  hlK(J)  arc  ex¬ 
tremely  similar  in  functional  form,  no  amount  of  cleverness  will  allow  us  to  segment  the 
image.  On  the  other  hand,  if  the  unbiased  measures  show  the  histograms  to  be  different, 
but  the  biased  performance  measures  show  that  segmentation  is  poor,  we  have  the  opportunity 
to  remediate.  Tnus,  the  unbiased  performance  measures  can  be  used  to  monitor  the  perform¬ 
ance  of  the  biased  performance  measures.  The  two  types  of  measures  should  agree. 
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O  X'. 


Cos  e 


A  simply  implemented  yet  effective  measure  called  cos  e  can  be  utilized, 
chance  variable  J  is  partitioned  into  N  grey  levels.  We  may  view  hT(J)  and 
N-dimensional ,  generalized  vectors,  hT  hFR.  Their  dot  product  would  be  KT 
hT  |j  Rfr  j  COs  e  =  Zh  T(J)  hFR(J)  solving  for  cos  8  we  have 
J 


Suppose  the 
hFR(J)  as  two 
•  hFR(J)  = 


lhT(J)  hFR 


(J) 


cos  8 


,151 _ 

?  rhT(K)i  2  z  rhFR(L,] 2 

k=i  L  J  l=il 


>5 


(16) 


Schwartz's  inequality  assures  us  that  the  denominator  is  always  less  than  or  equal  to  the 

numerator.  Note  that  if  hT(J)  =  c  •  hFR(J)  for  all  J  then  cos  6=1  and  6  =  0°.  If,  on  the 

other  hand,  hT(J)  =  0,  if  hFR(J)*  0  or  hFR(J)  =  0,  if  hT(j)*  o,  then  cos  6  =  0°and  6  =  90°. 

In  the  first  case  either  all  the  pixels  will  be  labeled  "background"  or  they  will  all  be 

labeled  "target." 


Unreliability  parameter 


The  unreliability  parameter  (UPAR)  measures  the  probability  that  a  target  pixel  is  likely 
to  be  labeled  background.  We  define  Up(J)  =  P(B/J)  P  (J/T)  as  the  probability  that  a  pixel 
whose  state-of-nature  is  target  and  whose  feature  value  is  J  is  labeled  "background. ”  Sum¬ 
ming  Up(J)  over  all  J  results  in  the  unreliability  parameter. 


Up  =  P  (B/T) 


IU  (J)  =  IP  (B/J)  P ( J/T) 
J  p  J 

However,  P(J/T)  =  hT(J) 


and 


P  (B/J) 


ghFR ( J) 
hWR(J) 


Therefore,  V  =  a  Z 


hT(J)  hFR  (J) 
hWR(J)  * 


(17) 


T  FR 

There  are  two  extremums.  First,  h  (J)  =  h  (J)  for  all  J,  and  in  this  case,  UD  = 

Second,  there  is  no  overlap  between  hT (J)  and  hFR(J),  i.e.,  hT(J)  =  0  whenever  _hFR(j)  *  0  and 
hFR(J)  =  0  whenever  hT(J)*  0,  and  in  this  case  Up  =  0.  It  will  take  on  intermediate  values 
for  intermediate  cases  of  overlap.  However,  if  a  approaches  either  one  or  zero,  Up  -*■  a . 
Therefore,  it  is  wise  to  scale  Un  by  a.  Values  of  Up  close  to  zero  indicate  favorable 
conditions  for  segmentation,  while  values  close  to  one  indicate  poor  segmentation  conditions. 


hT(J)  hFR  (J) 


Because  Ur 


P  (B/T) ,  it  is  interesting  to  note  that  the  P(T/B)  =  (i-,)- 


hWR ( J ) 


( 1  —  o. ) 


Up  and  thus  there  is  a  symmetry  between  the  two  performance  measures. 


Weighted  second  moment 


The  Weighted  Second  Moment  (WSM)  is  derived  by  rewriting  equation  1  in  the  following  manner 


hWR(J» 


hFR(J)  +  ( 1 -a) hT ( J ) 


If  we  define 


y  = 


hWR(J) 

hFR(j) 


and 


hT  (J) 
hFR  (J ) 


( ]  8 ) 


we  have  y=a+  ( 1  —  a )  x 


Equation  18  is  the  equation  of  a  straight  line  with  intercept  a  and  slope  (1-a).  The 
line  goes  through  the  point  (1,1).  The  set  of  data  points  [x(J),  y  (j)  ]  can  be  distributed 
in  any  fashion  on  the  line.  In  fact  the  distribution  of  the  points  on  the  line  is  used 
as  a  performance  measure.  The  point  (1,1)  has  special  significance.  At  that  point,  hT(J)  = 
hWR(J)  =  hFR(J),  and  only  the  crudest  type  of  image  segmentation  can  occur.  If  the  set  of 
points  { (x,y) }  are  confined  to  a  small  interval  about  the  point  (1,1),  segmentation  is 
poor.  If,  on  the  other  hand,  the  data  points  are  far  from  (1,1),  the  image  is  highly 
segmentable  (see  Fig.  4). 
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An  error  in  a  will  not  effect  y,  because  hWR(J)  and  hFR(J)  are  ob¬ 
tained  experimentally.  The  value  of  x  will,  however,  be  affected.  If 
the  calculated  value  of  a  (acaj)  is  smaller  than  its  correct  value, 
ac,  the  data  point  will  appear  closer  to  (1,1)  and  the  performance 
measure  will  be  more  pessimistic  than  warranted.  If  we  overestimate 
the  value  of  a,  acal  ^ci  the  data  point  appears  further  from  (1,1) . 
Fig.  4.  Linearized  The  performance  measure  will  be  overly  optimistic  (see  Fig.  4). 
pdf  equation. 

The  percent  error  in  the  value  of  ^1  -  x(J)  jean  be  derived  by  defining  acal  =  ac  +  aE 

where  “cal  =  estimated  value  of  a 

ac  =  correct  value  of  a 
aE  =  the  error  in  a 

Also  define  [l  -  x]  cai  =  the  value  of  ( 1 — x)  obtained  by  using  acai  and  ( 1 — x ) c  =  the  value 
( 1  —  x )  obtained  by  using  ac. 


U-x)  l  -  U-x)c  0 

The  fractional  error  - n ; -  =  - — - - 

1-x  1-a 

c  cal 


and  is  independent  of  the  point  (x,y). 


The  fractional  error  in  x 

X 

cal  xc 

-  is,  however, 

•  i .  e .  f 

xcal 

dependent  of  (x,y) . 

This  discussion  indicates 
late  the  variance  of  the  set 

strongly  that  the 
of  points  l (x,y) i 

point  (1,1)  plays  a  unique 
about  the  point  (1,1)  use 

role.  To  calcu- 
the  following: 

WSM  = 

:hWR (J)  Sfi 
J  '  L 

-  y(J) 

]  2  +  [l-x(J»]  2  j 

(19) 

and  from  equation 

18  it  can  be  seen 

that 

(1-x) 

=  1  (1 
(1-a) 

-  y) 

(20) 

and  therefore, 

WSM  = 

|(l-a)2  +  1 

!  ■  hwR 

(J)  [l  -  y  (J)  ]  2 

(21) 

(  (1-a)2 

)  5 

The  usefulness  of  WSM  as 
in  WSM  due  to  an  error  in  a 

a  performance  measure  arises  in  the  fact  that  the 
is  independent  of  the  distribution  of  data  points 

percent  error 
(x ,y )  .  Thus  , 

if  we  were  to  compare  the  performance  of  two  features  in  segmenting  the  image,  say  intensity 
and  edge  magnitude,  the  ratio  of  WSM  for  the  two  features  would  be  independent  of  the  value 
of  a. 

Entropy 

Several  entropy  measures  have  been  tested.  The  entropy  H  is  defined  as: 

N 

H  =  -  I  P(J)  log  (P(J))  (22) 

J  =  1 


where  J  *»  the  probability  of  occurrence  of  feature  value  J  and  N  =  the  number  of  feature 
values.  H  measures  how  evenly  the  feature  values  are  occupied.  If  one  feature  value  alone 
is  occupied,  then  H  =  0.  If  all  feature  values  are  evenly  occupied,  H  =  Hmax  =  log2  N 
(i.e.,  the  number  of  bits  allocated  to  the  histogram).  Obviously,  for  H  =  0  an  image  is 
r.-.onotonic,  but  for  H  =  Hmax  the  image  contains  the  greatest  possible  variety.  For  example, 
if  we  count  the  number  of  unique  adjacent  pairs  of  intensity  values  an  image  contains,  wo 
would  find  just  one  unique  pair  for  H  =  0.  However,  we  would  find,  on  the  average,  the 
greatest  number  of  unique  pairs  for  an  image  when  H  =  Hmax.  A  small  value  of  H  indicates 
one  or  more  narrow  peaks  in  the  feature  value  histograms,  whereas  a  large  value  of  H  in¬ 
dicates  a  large  deviation  of  the  feature  values. 
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The  entropies  for  the  window  HW,  target  HT ,  and  background  histograms  HB  have  been 
tested.  In  addition,  AH  =  |HB  -  HT|  has  been  used  as  a  measure  of  the  difference  in  the 
standard  deviation  of  the  background  and  target  histograms.  It  is  worth  verifying  if  ah, 
combined  with  the  difference  in  mean  values  between  the  background  and  target  histograms, 
can  parameterize  the  segmentability  of  the  image.  This  appproach  is  taken  in  analogy  with 
tne  difference  in  means,  and  ratio  of  the  standard  deviations,  completely  parameterizing 
two  Gaussian  distributions. 

Computer  simulations 

The  utility  of  the  performance  measures  was  tested  using  synthetic  imagery.  Two  types  of 
simulations  were  performed. 

Image  sequences 

Twenty-five  sequences,  each  containing  50  frames,  of  a  Gaussian  target  moving  through  a 
a  Gaussian  background  were  generated  on  the  HP-1000  and  viewed  on  the  I2S.  the  target  and 
background  statistics  are  characterized  by  the  equations 

(X-mT) 2  <X-mBi 


hT(X)  = 


ypT  o 


"T  and  nB  (X)  = 


where  X  =  grey  level  (it  is  transformed  into  the  appropriate  quantized  value  J) 
imp  =  the  mean  grey  value  of  the  target 
oT  =  the  standard  deviation  of  the  target  grey  values 
mB  =  the  mean  grey  value  of  the  background 
oB  =  the  standard  deviation  of  the  background  values. 

Two  parameters  were  used  to  characterize  the  segmentability  of  the  images.  They  are 


mB  “  mT 


and  o  = 


Fifteen  sequences  had  unvarying  statistics  while  10  sequences  had  either  A  or  p  varying  from 
frame- to- frame.  The  crudest  segmentation  procedures  were  implemented,  a  was  set  to  u.7b  for 
all  sequences.  The  maximum  likelihood  decision  criteria  was  used,  i.e.,  A  =  2.  No  time 
series  integration  to  minimize  short-term  statistical  variation  was  attempted  (see  Table  1). 

_ _ TABLE  1.  CATALOGUE  OF  SYNTHETIC  IMAGE  SEQUENCES 


Sequence 

Number 


Mean 

Grey 

Level 


No.  of  Grey 
Levels  Spanned 
by  of 


The  Increment 
m  A  per 


The  Increment 
in  p  per 
Frame 


Single  frames  were  generated  on  the  VAX-780.  The  target  and  background  statistics  were 
again  Gaussian,  but  the  more  general  minimum  risk  rule  was  applied,  i.e.,  A  varied.  In 
addition,  frames  which  represented  the  whole  range  of  a  values  were  examined.  Analytical 
results  can  be  obtained  for  Gaussian  distributions  and  these  are  compared  with  the  results 
of  the  simulation.  The  analytical  results  can  be  thought  of  as  the  limiting  case  when  the 
number  of  pixels  in  the  image  approach  infinity.  For  example,  the  performance  measure 
cos  ?  can  be  expressed  analytically  in  the  following  manner: 


=  pL^l 

[  1  +  p2 


-  A2/2(l  +  p2) 


Also,  the  locations  of  the  decision  criteria  Jc  can  be  analytically  determined.  The  follow¬ 
ing  two  cases  apply: 


oT  -  03 


+  J-  m  III  -«»)  C  (B/T)  \ 

2  A  ln  |  aC  (T/B)  f 


—  m~ 

where  :  =  — — - —  ,  the  distance  between  the  two  means  measured  in  standard  deviations. 

oT 

As  a-0,  Jc"*+“  and  as  a->-l  ,  Jc-»— «■ 

2.  c B  =  coT,c*l 


[r2  A2  +  2  ( p  2  -1)  ,-2  In  Ip  (l-a)C  (B/T) 


P2  -1 


1  o 

The  two  values  of  J  will  be  J  and  J 


aC (T/B) 


where  J, 


-••■+  1 


1  2 

Jc  can  have  zero,  one,  or  two  solutions.  Suppose  o>l.  As  a-0  J  •  +  and  Jc ■  +  ,  that 
is  all  { J }  are  to  be  labeled  "target."  As  1  becomes  larger,  the  two  values  of  Jc  approach 
—  and  the  set  narrows,  JC1<JT<JC2.  When  a  reaches  a  critical  value  there  is  one 

solution,  J  1  =  J  2  =  — If  a  becomes  larger  still,  there  will  be  zero  solutions  and 

C  C  p-1 

{J:  is  labeled  "background."  Suppose  p<l.  As  < -0  there  are  no  solutions  and  all  J  are 
labeled  "target."  As  a  reaches  a  critical  value  there  will  be  one  solution  and  Jc  =  — 2 - • 

As  a  becomes  larger  still  Jc1<{Jp)  <  Jc2.  When  »->i  £.11  {j}  are  labeled  "background." 

The  HR  and  FAR  can  then  be  analytically  determined  using  both  the  decision  criteria  cilc..: 
here  and  approximations  to  the  Gaussian  cumulative  distribution  function. 

Results 


The  following  types  of  questions  have  been  addressed: 

1.  How  well  do  the  performance  measures  correlate  with  the  statistics  of  the  imaaes? 

In  our  case  the  statistics  are  completely  specified  by  ,  the  difference  in  the 
means,  and  p,  the  ratio  of  the  variances  of  the  target  and  backqround  histoqrar.s. 

2.  How  well  does  each  calculated  performance  measure  correlate  with  the  actual  perform¬ 
ance  of  the  algorithm?  For  example,  how  well  does  the  calculated  hit  rate  correlate 
with  the  actual  hit  rate? 

3.  How  large  is  the  range  of  values  that  a  performance  measure  takes  as  a  function  of 
the  variation  in  the  statistics  of  the  images? 

4.  Are  any  performance  measures  redundant?  That  is,  are  there  nairs  of  performance 
measures  which  correlate  well  with  each  other? 
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Figures  5  through  9,  obtained  from  the  VAX-780  simulations,  show  the  correlation  of  some 
of  the  performance  measures  with  A  and  p.  Figure  5  illustrates  that  the  Gaussian  statistics 
can  be  parameterized  by  A  and  p.  The  legend  "infinite  (■)  pixels"  indicates  the  results  were 
obtained  analytically  and  are  not  subject  to  the  statistical  fluctuations  which  occur  when 
a  finite  number  of  pixels  are  sampled. 


Fig.  5.  Gaussian  background  distributions  as  a  Fig.  6.  Unreliability 

function  of  A  and  p.  parameter  vs.  A. 


Fig.  7.  False  alarm  fraction  Fig.  8.  Cos  6  with  p  varying  Fig.  9.  Cos  e  with  a  vary- 
(FAF)  vs.  A.  from  0.1  to  1.0.  ing  number  of 

pixels . 

To  test  the  performance  measures  fairly  for  a  finite  number  of  pixels,  the  image  se¬ 
quences  which  were  generated  on  the  HP-1000  were  subjected  to  two  operations.  The  means 
and  standard  deviations  of  all  performance  measures  were  calculated.  These  were  compared 
to  the  parameters  A  and  p  which  characterize  the  segmentabil ity  of  the  images.  Each  per¬ 
formance  measure  was  calculated  in  two  different  ways.  One  value  was  obtained  by  presuming 
omniscience.  The  target  and  background  histograms  were  obtained  by  scanning  the  target  and 
background  domains  in  the  window  region.  The  performance  measures  obtained  in  this  manner 
are  the  control  results.  The  other  value  was  obtained  by  assuming  the  same  iqnorance  of 
the  target  and  background  histograms  that  were  obtained  for  the  tracker.  hT(J)  and  hB(J) 
were  obtained  from  hFR(J)  and  hWR(j)  and  are  test  results.  The  correlation  function  for 
the  control  and  test  results  of  each  performance  measure  over  the  50  frames  of  each  se¬ 
quence  wore  calculated.  These  values  arc  called  the  self-correlations.  The  results  indi¬ 
cate  that  when  the  performance  measures  have  a  high  standard  deviation,  such  as  those  ob¬ 
tained  for  SEQ  622,  the  self-correlation  is  high  (see  Fig.  10  through  14) .  Note  that  the 
peaks  in  cos  0 ,  UPAR  and  Bays  coincide  with  each  other  and  coincide  with  the  valleys  of  HR. 
Note  that  Bays  is  less  than  0.25  for  the  entire  sequence  because  >  =  0.75.  The  maximum 
likelihood  criterion  makes  it  probable  that  under  very  poor  segmentation  conditions  when 
•  0  and  p  •  1,  all  pixels  will  be  labeled  "background”  if  u  0.5.  Thus,  for  poor  seg¬ 
mentation  conditions 

BAYS  *  (1  -  «)  for  n  •  0.5  and  BAYS  *  ■  for  i  <  0.5 

and  Bays  is  therefore  in  general  •  0.5. 
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FRAME  NUMBER  FRAME  NUMBER  FRAME  NUMBER 


Fig.  10.  SEQ  622,  cos  8.  Fig.  11.  SEQ  622,  hit  rate.  Fig.  12.  SEQ  622,  UPAR. 


FRAME  NUMBER  FRAME  NUMBER 

Fig.  13.  SEQ  622,  Bays.  Fig.  14.  SEQ  622,  Entropy  - 

HW/ (HW) max* 

This  argument  implies  that  when  a  >  0.5  the  errors  tend  to  be  misses  rather  than  false 
alarms  and  vice  versa.  This  may  explain  the  lack  of  variation  in  the  FAR  for  most  of  the 
data . 

One  phenomemon  that  cannot  be  explained  at  present  is  the  coincidence  of  the  peaks  of  the 
entropy  in  Fig.  14  with  the  valleys  of  the  HR  in  Fig.  11.  The  general  trend  of  the  graphs, 
as  expected,  is  the  same,  but  the  peaks  should  also  coincide. 

In  summary  most  of  the  performance  measures  appear  to  be  useful.  The  WSM  did  poorly  and 
needs  to  be  dropped  or  reworked.  Cos  e  and  UPAR. are  highly  correlated  and  are  therefore 
redundant.  The  FAR  and  FAF  vary  little  due  to  o  >  0.5.  Both  performance  measures  need  to 
be  tested  further  on  imagery  for  which  u  <  0.5.  For  such  imagery  the  HR  is  expected  to  de¬ 
viate  little.  Performance  measures  such  as  FAR,  FAF,  and  entropy,  which  have  small  stand¬ 
ard  deviations,  can  be  useful  as  alarms  that  indicate  a  drastic  variation  in  the  tracking 
condition  when  they  themselves  change.  Cos  0,  Bays,  HR,  and  UPAR  appear  to  be  the  most 
sensitive  performance  measures. 

Figure  15  through  20  are  from  SEQ  623.  Note  that  the  test  performance  measures  vary 
more  and  on  the  average  give  a  more  optimistic  estimate  than  the  control.  Also  note  the 
high  correlation  between  cos  0  and  UPAR  illustrated  in  Fig.  15. 

The  graphs  from  SEQ  628  (see  Fig.  21  and  22)  show  that  the  performance  measures  arc  less 
sensitive  to  variations  in  i  than  to  variations  in  .  Although  the  self-correlation  is  poor 
(0.657  for  Fig.  21),  note  that  the  standard  deviations  are  also  smaller.  Also  note  that  th-. 
self-correlations  would  be  considerably  improved  if  some  type  o!  averaging  oi  into.::  it: 
were  performed. 


SPIf  Vo!  359  Applications  of  Digital  Image  Processing  IV  It 982 1  35 7 


Fig.  15.  SEQ  623,  cos  6, 

UPAR;  test  results 


Fig.  16.  SEQ  623,  UPAR. 


Fig.  17. 


SEQ  623,  cos  6. 


z 

«  0» 

< 

111 

M 


CONTROL  - - 

TEST - 


FRAME  NUMBER 


Fig.  18.  SEQ  623,  hit  rate.  Fig.  19. 


FRAME  NUMBER 

Fig.  21.  SEQ  628,  false 
alarm  fraction. 


SEQ  623,  Bays.  Fig.  20.  False  alarm  rate 


Fig.  22.  SEQ  628,  cos  e. 


SEQ  610  and  SEQ  629  have  the  same  values  for  L  and  p.  The  difference  between  the  se¬ 
quences  is  that  oT  spans  10  grey  levels  for  SEQ  610  and  40  grey  values  for  SEQ  629.  HW 
differs  considerably  for  the  two  sequences  as  illustrated  in  Table  2. 
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TABLE  2.  NUMBER  OF  BITS  VS.  ENTROPY  OF  WINDOW 


Number  of 
bits 

SEQ  610  SEQ  629 

HW/(HW)max  HW/(HW)max 

2 

0.05  0.31 

3 

0.07  0.34 

4 

0.29  0.50 

5 

0.32  0.50 

6 

0.34  0.56 

A  comparison  of  the  other  performance 
measures  shows  that  SEQ  629  is  more  seg- 
mentable  than  SEQ  610.  Table  3  illustrates 
the  correlation  of  HW  with  L  and  Table  4 
does  the  same  for  AH  and  o .  Note  the 
variation  of  HW  with  A  in  Fig.  23  and  note 
the  higher  average  values  when  compared  to 
Fig.  14.  Figure  24  illustrates  the 
correlation  between  HW  and  HR. 


TABLE  3.  ENTROPY  OF  THE  WINDOW 


TABLE  4.  AH  VS.  r 


Sequence 

HW 

Number 

(HW)  max 

SEQ  110 

.  1 

0.0 

.  32 

SEQ  113 

.  1 

0.5 

.  21 

SEQ  116 

.1 

1.0 

.25 

SEQ  119 

.  1 

2.0 

.  26 

SEQ  111 

0.5 

0.0 

.37 

SEQ  114 

0.5 

0.5 

.  37 

SEQ  117 

0.  5 

1.0 

.40 

SEQ  120 

0.5 

2.0 

.45 

SEQ  112 

1.0 

0.0 

.48 

SEQ  115 

1.0 

0.5 

.48 

SEQ  118 

1.0 

1.0 

.  50 

SEQ  121 

1.0 

2.0 

.55 

1 

lO 

Sequence 

Number 


u 

P 

AH* 

(maximum  entropy =5.0) 

Standard 
tion  of 

Devia- 

AH 

Experi¬ 

mental 

Control 

Experi¬ 

mental 

Control 

0.5 

0.1 

2.33 

2.35 

0.10 

0.10 

0.5 

0.5 

0.73 

0.86 

0.17 

0.10 

0.5 

1.0 

0.35 

0.10 

0.31 

0.07 

1.0 

0.1 

2.19 

2.10 

0.10 

0.11 

1.0 

0.5 

0.60 

0.85 

0.26 

0.12 

1.0 

1  .0 

0.34 

0.10 

0.29 

0.07 

2.0 

0.1 

2.34 

2.34 

0.07 

0.07 

2.0 

0.5 

0.53 

0.85 

0.30 

0.10 

2.0 

1.0 

0.55 

0.08 

0.42 

0.06 

SEQ  6  21  | 2. 0  I  1 . 0  |  0.55  |  0.08  |  0.42  | 

‘Average  difference  between  target  and  background 
entropies,  H. 

Future  research 


ENTROPY  -HW 
HIT  RATE 


EAAME  NUMBER 


ERAME  NUMBER 


Fig.  23 


that  h (J 
average 


Bimodal  and  trimodal  synthe¬ 
tic  Gaussian  sequences  will  be 
generated  and  the  performance 
measures  will  be  tested.  The 
performance  measures  will  be  in¬ 
tegrated  into  MFBIT.  The  calcu¬ 
lation  of  entropy  is  computation¬ 
ally  costly  and  a  replacement 
is  being  sought.  A  function  of 
the  grey  level  distribution  of 
the  histograms  which  is  an  ex¬ 
tremum  is  needed  when  all  grey 
levels  are  uniformly  occupied. 

One  possibility  being  considered 
is 

N 

SHIR  it  h  ( J) 

J=1 


SHIR  it  h  ( J) 

SEQ  624,  entropy-  Fig.  24.  SEQ  607,  entropy  —  J=1 

HW.  HW,  hit  rate.  .  ,  .  ,  .  ,  .  , 

where  h(J)=  number  of  pixels  with 

grey  level  J  with  the  proviso 

’)  =1  if  no  pixels  occupy  grey  level  3 '  .>  Another  possibility  is  to  calculate  the 
absolute  deviation  of  the  grey  value  occupation  from  a  uniform  distribution. 

Conclusion 


It  appears  that  the  performance  measures  cos  6,  UPAR,  HR,  FAR,  and  Bays  are  useful 
measures  for  statistical  segmentation  algorithms.  Their  validity  and  reliability  have  been 
proven  by  computer  simulations.  The  entropy  measures  may  be  used  to  characterize  the  prob¬ 
ability  distribution  functions  (pdf)  in  a  manner  analogous  to  the  characterization  of 
Gaussian  distributions  by  the  mean  and  standard  deviation.  The  entropy  is  less  sensitive  to 
variations  in  the  pdf  and  can  be  used  to  signal  drastic  changes  in  the  imagery.  Some  of  the 
performance  measures,  such  as  cos  e  and  UPAR,  are  highly  correlated  with  each  other.  It  is 
therefore  likely  that  only  a  subset  of  the  performance  measures  will  be  implemented  on 
MFBIT. 
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